Goto

Collaborating Authors

 data asset


Semantic Modelling of Organizational Knowledge as a Basis for Enterprise Data Governance 4.0 -- Application to a Unified Clinical Data Model

Oliveira, Miguel AP, Manara, Stephane, Molé, Bruno, Muller, Thomas, Guillouche, Aurélien, Hesske, Lysann, Jordan, Bruce, Hubert, Gilles, Kulkarni, Chinmay, Jagdev, Pralipta, Berger, Cedric R.

arXiv.org Artificial Intelligence

Individuals and organizations cope with an always-growing amount of data, which is heterogeneous in its contents and formats. An adequate data management process yielding data quality and control over its lifecycle is a prerequisite to getting value out of this data and minimizing inherent risks related to multiple usages. Common data governance frameworks rely on people, policies, and processes that fall short of the overwhelming complexity of data. Yet, harnessing this complexity is necessary to achieve high-quality standards. The latter will condition any downstream data usage outcome, including generative artificial intelligence trained on this data. In this paper, we report our concrete experience establishing a simple, cost-efficient framework that enables metadata-driven, agile and (semi-)automated data governance (i.e. Data Governance 4.0). We explain how we implement and use this framework to integrate 25 years of clinical study data at an enterprise scale in a fully productive environment. The framework encompasses both methodologies and technologies leveraging semantic web principles. We built a knowledge graph describing avatars of data assets in their business context, including governance principles. Multiple ontologies articulated by an enterprise upper ontology enable key governance actions such as FAIRification, lifecycle management, definition of roles and responsibilities, lineage across transformations and provenance from source systems. This metadata model is the keystone to data governance 4.0: a semi-automatised data management process that considers the business context in an agile manner to adapt governance constraints to each use case and dynamically tune it based on business changes.


Open Data on GitHub: Unlocking the Potential of AI

Roman, Anthony Cintron, Xu, Kevin, Smith, Arfon, Vega, Jehu Torres, Robinson, Caleb, Ferres, Juan M Lavista

arXiv.org Artificial Intelligence

GitHub is the world's largest platform for collaborative software development, with over 100 million users. GitHub is also used extensively for open data collaboration, hosting more than 800 million open data files, totaling 142 terabytes of data. This study highlights the potential of open data on GitHub and demonstrates how it can accelerate AI research. We analyze the existing landscape of open data on GitHub and the patterns of how users share datasets. Our findings show that GitHub is one of the largest hosts of open data in the world and has experienced an accelerated growth of open data assets over the past four years. By examining the open data landscape on GitHub, we aim to empower users and organizations to leverage existing open datasets and improve their discoverability -- ultimately contributing to the ongoing AI revolution to help address complex societal issues. We release the three datasets that we have collected to support this analysis as open datasets at https://github.com/github/open-data-on-github.


Metadata driven development realises "smart manufacturing" of data ecosystems – blog 3 - Solita Data

#artificialintelligence

This is the third part of the blog series. The 1st blog focused on the maturity model and explained how the large monolith data warehouses were created. The 2nd blog focused on metadata driven development or "smart manufacturing" of data ecosystems. This 3rd blog will talk about reverse engineering or how existing data assets can be discovered to accelerate the development of new data products. Companies have increasing pressure to start addressing the data silos to reduce cost, improve agility & accelerate innovation, but they struggle to deliver value from their data assets. Many companies have hundreds of systems, containing thousands of databases hundreds of thousands of tables, millions of columns, and millions of lines of code across many different technologies. The starting point is a "data spaghetti" that nobody knows well.


The many layers of data lineage. What can we learn from google maps to…

#artificialintelligence

Having a map showing how data evolves from its sources to its destination is the dream of any organisation. Like the gold rush, everyone is after that tool connecting together columns, tables and dashboards within the warehouse. But like gold, this visualisation has been always considered a privilege in the data ecosystem. Defining the lineage has been a manual task not accessible to everyone. Usually, only the ones working daily with the data transformation processes are aware of the actual flow of data -- and typically this lineage is a mix between what's in their minds, documented information and digging into different tools' metadata.


Director, Data Engineering at Visa - Bengaluru, India

#artificialintelligence

Visa is a world leader in digital payments, facilitating more than 215 billion payments transactions between consumers, merchants, financial institutions and government entities across more than 200 countries and territories each year. Our mission is to connect the world through the most innovative, convenient, reliable and secure payments network, enabling individuals, businesses and economies to thrive. When you join Visa, you join a culture of purpose and belonging – where your growth is priority, your identity is embraced, and the work you do matters. We believe that economies that include everyone everywhere, uplift everyone everywhere. Your work will have a direct impact on billions of people around the world – helping unlock financial access to enable the future of money movement.


What is Data Governance? Top Data Governance Tools for Data Science and Machine Learning Research in 2022

#artificialintelligence

The process of developing internal data standards and enacting rules governing who has access to data and how it is utilized for analytical applications and business operations is known as data governance. A good data governance program guarantees that data is reliable, consistent, and accessible and that its use complies with applicable rules and regulations regarding data protection. In addition to master data management (MDM) projects, it frequently includes data quality improvement initiatives. Software of this type offers features that facilitate the formulation of data governance policies, the construction of business glossaries and data catalogs, data mapping and classification, workflow management, collaboration, and process documentation. Software for data governance can be used in conjunction with MDM, metadata management, and data quality solutions. Data governance aims to promote confident decisions supported by solid data resources. Building policies that define data ownership, duties, and delegates are the goal of data governance.


Senior Analytics Engineer

#artificialintelligence

Pango Group, an Aura Company, helps customers monitor, manage, and protect against the risks associated with their identities and personal information in a digital world. Backed by WndrCo, Warburg Pincus and General Catalyst, Pango Group is dedicated to creating the world's most comprehensive portfolio of industry-leading cybersecurity solutions. Our vision is to become THE go-to resource for every cyber protection need individuals may face - today and in the future. Pango Group, an Aura Company, helps customers monitor, manage, and protect against the risks associated with their identities and personal information in a digital world. Backed by WndrCo, Warburg Pincus and General Catalyst, Pango Group is dedicated to creating the world's most comprehensive portfolio of industry-leading cybersecurity solutions.


Sr Data Engineer

#artificialintelligence

As the world's leader in digital payments technology, Visa's mission is to connect the world through the most creative, reliable and secure payment network - enabling individuals, businesses, and economies to thrive. Our advanced global processing network, VisaNet, provides secure and reliable payments around the world, and is capable of handling more than 65,000 transaction messages a second. The company's dedication to innovation drives the rapid growth of connected commerce on any device, and fuels the dream of a cashless future for everyone, everywhere. As the world moves from analog to digital, Visa is applying our brand, products, people, network and scale to reshape the future of commerce. At Visa, your individuality fits right in.


A Dagster Crash Course

#artificialintelligence

Hey - I'm the head of engineering at Elementl, the company that builds Dagster. This post is my take on a crash-course introduction to Dagster. And if you want to support the Dagster Open Source project, be sure to star our Github repo. Dagster is a data orchestrator. Think of Dagster as a framework for building data pipelines, similar to how Django is a framework for building web apps.


Council Post: The Five Pitfalls Of Adopting AI In Financial Services And How To Avoid Them

#artificialintelligence

Suresh is a Data and AI Engineering lead for the financial services industry at Microsoft and a senior member of IEEE Computer Society. The financial services industry (FSI) is increasingly adopting artificial intelligence (AI) in recent years. The results of a recent survey by the Economist Intelligence Unit show that 85% of the respondents (banking IT leaders) have a "clear strategy" for using AI in product and service development. This is also evident in recent hiring trends in banks for AI-related jobs. It's great to see AI adoption at this scale, but this also makes it crucial for FSI leaders to watch for and avoid some of the following leading pitfalls in AI initiatives.